library(tidyverse)
library(readxl)
library(janitor)
library(here)
library(lubridate)
library(plotly)
library(mosaic)
library(datapasta)
library(patchwork)
# loading in data
meteorological_data <- read_xlsx(here("data", "Sensor Data", "Meteorological Data.xlsx"))
sensor_data <- read_xlsx(here("data", "Sensor Data", "Sensor Data.xlsx"))
lekagul_sensor_data <- read_csv(here("data", "Traffic Data", "Lekagul Sensor Data.csv"))
# basic data cleaning
meteorological_data <- meteorological_data %>%
  clean_names() %>%
  select(-x4)
sensor_data <- sensor_data %>%
  clean_names() %>%
  mutate(monitor = as.factor(monitor))
lekagul_sensor_data <- lekagul_sensor_data %>%
  clean_names() %>%
  mutate(car_type = as.factor(car_type))
#creating traffic data with coordinates

gate_name <- c("ranger-stop1", "gate2", "entrance1", "ranger-stop4", "camping5", "camping4", "camping2", "camping3", "camping0", "entrance0", "general-gate1", "gate0", "gate1", "general-gate7", "gate7", "general-gate4", "ranger-stop2", "ranger-stop0", "general-gate0", "general-gate2", "camping1", "general-gate5", "general-gate6", "ranger-stop6", "gate6", "entrance3", "entrance4", "gate8", "camping6", "ranger-base", "gate5", "gate3", "ranger-stop3", "camping8", "general-gate3", "entrance2", "gate4", "ranger-stop5", "camping7", "ranger-stop7")

X <- c(20, 25, 18, 19, 21, 49, 45, 46, 53, 63, 65, 64, 59, 66, 97, 70, 81, 90, 110, 104, 129, 124, 136, 123, 116, 115, 140, 138, 150, 128, 131, 149, 148, 183, 186, 183, 164, 151, 181, 100)

Y <- c(175, 145, 132, 104, 79, 110, 135, 131, 158, 186, 174, 166, 155, 56, 40, 102, 164, 183, 190, 167, 149, 89, 63, 53, 49, 33, 16, 19, 23, 25, 54, 139, 154, 151, 144, 113, 86, 82, 55, 48)

gates <- data.frame(gate_name, X, Y)

#join gates dataframe with lekagul_sensor data frame by the name of gate
lekagul_traffic <- inner_join(lekagul_sensor_data, gates, by = "gate_name")

#splitting timestamps into date and time variables
library(lubridate)
lekagul_traffic <- lekagul_traffic %>% 
  mutate(date = as_date(timestamp), 
         time_12 = format(timestamp, "%I:%M %p"),
         datetime = as.character(timestamp))

For Data Challenge 3 we were asked as experts in visual analytics to help Mitch Vogel analyze these datasets since he has been discovering signs that the number of nesting pairs of the Rose-Crested Blue Pipit is decreasing. Something is suspicious, but what?

knitr::include_graphics("https://www.allaboutbirds.org/guide/assets/photo/297326811-1280px.jpg")
Rose-breasted Grosbeak by Tom Snow, Macaulay Library

Rose-breasted Grosbeak by Tom Snow, Macaulay Library

What does the Traffic Data tell us?

“Patterns of Life” analyses depend on recognizing repeating patterns of activities by individuals or groups. Describe some of the daily patterns of life you observe in the vehicles traveling through and within the park. Characterize the patterns by describing:

the kinds of vehicles

their spatial activities (where do they go?)

their temporal activities (when does the pattern happen?)

and provide a hypothesis of what the pattern represents (for example, if I drove to a coffee house every morning, but did not stay for long, you might hypothesize I’m getting coffee “to-go”).

Some patterns may appear over longer periods of time (in this case, over multiple days). Describe a few patterns of life that occur over longer time periods by vehicles traveling through and within the park. You may want to use the same what-where-when breakdown described above to frame your description.

In an attempt to break down the patterns of different vehicles, we decided to look at each vehicle type and the typical number of movements for that type. If there were any car ids that stuck out as having significantly more movements that the other types of similar vehicles, we looked into their movements more closely. Overall, type 1 and 2 vehicles had the most IDs with unusual numbers of movements.

Type 1 Vehicles

Filtering the traffic data to only include type 1 vehicles produces the following output:

#filtering by type 1 cars
lekagul_traffic %>% 
  filter(car_type == "1") %>% 
  group_by(car_id) %>% 
  count(car_id, sort = TRUE) 
## # A tibble: 7,487 x 2
## # Groups:   car_id [7,487]
##    car_id                 n
##    <chr>              <int>
##  1 20154112014114-381    98
##  2 20155705025759-63     70
##  3 20162904122951-717    36
##  4 20155308075334-739    26
##  5 20155820075838-714    25
##  6 20161515101514-592    25
##  7 20152605012634-824    24
##  8 20150018080022-664    20
##  9 20150509050552-677    20
## 10 20151613111653-93     20
## # … with 7,477 more rows

As we can see, the typical range for type 1 vehicles is about 14 to about 25 movements. ID’s 20154112014114-381 and 20155705025759-63 have significantly more movement than other vehicle type 1s with 98 and 70 movements, respectively. Additionally, ID 20162904122951-717 has more than the others with 36 movements.

ID 20154112014114-381 (98 movements)

Upon further examination, this vehicle enters at entrance 0, goes to general-gate 1, ranger-stop 2, ranger-stop 0, general-gate 2, general-gate 5, then camping 6. Two days later around 10:30 PM, it leaves camping 6, going back the exact way it came in. It repeats this exact same shift every week. This movement is thus very regular, pointing to perhaps a maintanence worker or park ranger of some sort. See animation below.

#make graph to show the pattern above
fig4 <- lekagul_traffic %>% 
  filter(car_id == "20154112014114-381") %>% 
  plot_ly(x = ~X,
    y = ~Y,
    frame = ~datetime,
    type = 'scatter',
    mode = 'markers',
    color = ~car_id,
    colors = "red") %>%
    layout(
      title = "Movements of ID 20154112014114-381",
      images = list(source = "https://raw.githubusercontent.com/mariumtapal/sds325-dc3/master/data/Sensor%20Data/MapLargeLabels.jpg",
           xref = "x",
           yref = "y",
           x = 0,
           y = 200,
           sizex = 200,
           sizey = 200,
           sizing = "stretch",
           opacity = 1,
           layer = "below")) 

fig4 <- fig4 %>% layout(
    xaxis = list(range = c(0, 200)),
    yaxis = list(range = c(0, 200)))
fig4

ID 20155705025759-63 (70 movements)

This id is kind of strange because it enters the park from entrance 0 on 6/05/2015, and then it never leaves. It goes around to different camping areas, stopping at each one (except for camping 7 and 8) for close to a month, except for camping 1, where it only stays for 15 minutes. The last data point for this vehicle is from 5/20/2016, where it has just entered camping 5. See animation below.

fig5 <- lekagul_traffic %>% 
  filter(car_id == "20155705025759-63") %>% 
  plot_ly(x = ~X,
    y = ~Y,
    frame = ~datetime,
    type = 'scatter',
    mode = 'markers',
    color = ~car_id,
    colors = "yellow") %>%
    layout(
      title = "Movements of ID 20155705025759-63",
      images = list(source = "https://raw.githubusercontent.com/mariumtapal/sds325-dc3/master/data/Sensor%20Data/MapLargeLabels.jpg",
           xref = "x",
           yref = "y",
           x = 0,
           y = 200,
           sizex = 200,
           sizey = 200,
           sizing = "stretch",
           opacity = 1,
           layer = "below")) 

fig5 <- fig5 %>% layout(
    xaxis = list(range = c(0, 200)),
    yaxis = list(range = c(0, 200)))
fig5

This vehicle often goes down through general gates 4 and 7 before going back up to the campgrounds, putting it in range of the sensors. Maybe this is a case of which roads can be used by which vehicles, but it does seem a little odd. Also, general-gate 7 is the destination this vehicle goes to the most. See figure below.

#bar graph of how many times car id 20155705025759-63 went to each place
lekagul_traffic %>% 
  filter(car_id == "20155705025759-63") %>% 
  ggplot(mapping = aes(x = gate_name)) +
  geom_bar()+
  theme(axis.text.x = element_text(angle = 270)) +
  labs(title = "Frequency of destinations for ID 20155705025759-63")

ID 20162904122951-717 (36 movements)

This vehicle repeats the same pattern over and over, entering at entrance 3, going through general-gate 7, and then going up to camping 0 where it stays for over two days before going back the same way. It repeats this path weekly. It seems like going through general-gate 7 is definitely not the most direct way to camping 0 from entrance 3, which seems kind of suspicious. Again, general-gate 7 is near several of the sensors, as is entrance 3. See animation below.

fig5.5 <- lekagul_traffic %>% 
  filter(car_id == "20162904122951-717") %>% 
  plot_ly(x = ~X,
    y = ~Y,
    frame = ~datetime,
    type = 'scatter',
    mode = 'markers',
    color = ~car_id,
    colors = "green") %>%
    layout(
      title = "Movements of ID 20162904122951-717",
      images = list(source = "https://raw.githubusercontent.com/mariumtapal/sds325-dc3/master/data/Sensor%20Data/MapLargeLabels.jpg",
           xref = "x",
           yref = "y",
           x = 0,
           y = 200,
           sizex = 200,
           sizey = 200,
           sizing = "stretch",
           opacity = 1,
           layer = "below")) 

fig5.5 <- fig5.5 %>% layout(
    xaxis = list(range = c(0, 200)),
    yaxis = list(range = c(0, 200)))
fig5.5

Type 2 Vehicles

Filtering for Type 2 vehicles produces the following output:

#filtering by type 2 cars
lekagul_traffic %>% 
  filter(car_type == "2") %>% 
  group_by(car_id) %>% 
  count(car_id, sort = TRUE)
## # A tibble: 4,717 x 2
## # Groups:   car_id [4,717]
##    car_id                 n
##    <chr>              <int>
##  1 20154519024544-322   281
##  2 20151818111845-978    27
##  3 20150028050007-273    24
##  4 20151204111220-934    20
##  5 20151415091432-219    20
##  6 20154026094053-772    20
##  7 20154202114234-259    20
##  8 20154328044302-517    20
##  9 20155517055510-758    20
## 10 20155012015018-20     19
## # … with 4,707 more rows

ID 20154519024544-322, which we looked at above, has the most movements by far at 281. Typical movements for the rest of type 2 vehicles range from 12 to about 25.

ID 20154519024544-322 (281 movements)

This vehicle takes the same route every four days or so, entering entrance 4, going to general-gate 5, general-gate 2, ranger-stop 0, ranger-stop2, general-gate 1, general-gate 4, general-gate 7, and ending at camping 4, where they stay for a couple days before going back the way they came in. So this pattern appears regular and not necessarily suspicious, but it is weird that this vehicle travels so much more than any others. Like we iterated above, it could be a regular delivery.

One thing we also noticed is that after going to general-gate 1, instead of going through gates 0 and 1 to get to camping 4, the vehicle goes down to general-gate 4 and general-gate 7 before coming back up to camping 4. Similar to IDs above, this route feels indirect, and it puts this vehicle in range of the sensors when it is down by general-gate 7. See animation below.

#animated plot of all of 20154519024544-322 movements
#include this graph
fig6 <- lekagul_traffic %>% 
  filter(car_id == "20154519024544-322") %>% 
  plot_ly(x = ~X,
    y = ~Y,
    frame = ~datetime,
    type = 'scatter',
    mode = 'markers',
    color = ~car_id,
    colors = "hot pink") %>% 
    layout(
      title = "Movements of ID 20154519024544-322",
      images = list(source = "https://raw.githubusercontent.com/mariumtapal/sds325-dc3/master/data/Sensor%20Data/MapLargeLabels.jpg",
           xref = "x",
           yref = "y",
           x = 0,
           y = 200,
           sizex = 200,
           sizey = 200,
           sizing = "stretch",
           opacity = 1,
           layer = "below")) 

fig6 <- fig6 %>% layout(
    xaxis = list(range = c(0, 200)),
    yaxis = list(range = c(0, 200)))
fig6

Type 3 Vehicles

Filtering for type 3 vehicles produced the following output:

#filtering by type 3 cars
lekagul_traffic %>% 
  filter(car_type == "3") %>% 
  group_by(car_id) %>% 
  count(car_id, sort = TRUE) 
## # A tibble: 3,039 x 2
## # Groups:   car_id [3,039]
##    car_id                 n
##    <chr>              <int>
##  1 20153408043401-757    25
##  2 20153628063656-228    24
##  3 20150319120301-287    20
##  4 20150724120712-867    20
##  5 20151209021212-232    20
##  6 20151804041847-645    20
##  7 20152414062411-153    20
##  8 20152606022604-255    20
##  9 20155117065115-489    19
## 10 20150014030033-497    18
## # … with 3,029 more rows

The two IDs with the most movements, 20153408043401-757 and 20153628063656-228 do not differ from the others by all that much, but we looked into them anyway. The typical range of movements for type 3 vehicles is between 11 and 20.

ID 20153408043401-757 (25 movements)

This vehicle’s movement does not seem overly suspicious, although they do spend about 14 hours at camping 5 before going back almost the way they came. Instead of exiting where they entered, at entrance 3, they exit entrance 4. This is potentially interesting because entrance 3 is right around the factories and sensors. See animation below.

#plot of movements of id 20153408043401-757
fig7 <- lekagul_traffic %>% 
  filter(car_id == "20153408043401-757") %>% 
  plot_ly(x = ~X,
    y = ~Y,
    frame = ~datetime,
    type = 'scatter',
    mode = 'markers',
    color = ~car_id) %>%
    layout(
      title = "Movements of ID 20153408043401-757",
      images = list(source = "https://raw.githubusercontent.com/mariumtapal/sds325-dc3/master/data/Sensor%20Data/MapLargeLabels.jpg",
           xref = "x",
           yref = "y",
           x = 0,
           y = 200,
           sizex = 200,
           sizey = 200,
           sizing = "stretch",
           opacity = 1,
           layer = "below")) 

fig7 <- fig7 %>% layout(
    xaxis = list(range = c(0, 200)),
    yaxis = list(range = c(0, 200)))
fig7

ID 20153628063656-228

Like ID 20154519024544-322, this vehicle also goes to camping 4 by way of general-gate 7 and is at camping 4 for 10 days. Except instead of exiting entrance 0 where they came in, they go out entrance 2 on the other side of the park, which is strange for how out of the way entrance 2 seems to the rest of their movements. See animation below.

#plot of movements of id 20153628063656-228
fig8 <- lekagul_traffic %>% 
  filter(car_id == "20153628063656-228") %>% 
  plot_ly(x = ~X,
    y = ~Y,
    frame = ~datetime,
    type = 'scatter',
    mode = 'markers',
    color = ~car_id) %>%
    layout(
      title = "Movements of ID 20153628063656-228",
      images = list(source = "https://raw.githubusercontent.com/mariumtapal/sds325-dc3/master/data/Sensor%20Data/MapLargeLabels.jpg",
           xref = "x",
           yref = "y",
           x = 0,
           y = 200,
           sizex = 200,
           sizey = 200,
           sizing = "stretch",
           opacity = 1,
           layer = "below")) 

fig8 <- fig8 %>% layout(
    xaxis = list(range = c(0, 200)),
    yaxis = list(range = c(0, 200)))
fig8

Other Vehicle Types (2P, 4, 5, and 6)

For all other vehicle types, we did not see any IDs with highly unusual numbers of movements for their type.

For type 2P, the typical range of movements was between 6 and 49.

For type 4, the typical range of movements was between 3 and 14.

For types 5 and 6, the typical range of movements was between 2 and 9.

Some activities may deviate from an established pattern or are just difficult to explain from what you know of a situation. Describe any unusual patterns (either single day or multiple days) and highlight why you find them unusual.

What are the top 3 patterns you discovered that you suspect could be most impactful to bird life in the preserve?

In looking at the multi-day movements of different vehicles in the preserve, we noticed that several of the vehicles with large numbers of movements passed through general-gate 7 when it seemed somewhat out of their way to do so. Because general-gate 7 is nearby several of the sensors, we thought this could be a location for chemical dumping, which would heavily impact the bird life.

Additionally, there were a couple vehicles that went in or out entrance 3, which is right around all of the factories and sensors and could be further related to chemical dumping.

What does the Sensor Data tell us?

Turning your attention to the sensor data, characterize the sensors’ performance and operation. Are they all working properly?

Can you detect any unexpected behaviors of the sensors by analyzing the readings they capture?

What about the chemicals did we find?

Which chemicals are being detected by the sensor group?

p1 <- sensor_data %>%
  group_by(chemical) %>%
  ggplot(aes(x = date_time, y = reading, color = chemical)) +
  geom_line(aes(group = chemical), alpha = 0.4) +
  labs(title = "Sensor Readings by Chemical Type", x = "Month", y = "Monitor Reading")
ggplotly(p1)

In the interactive plot above, you can see the readings for each chemical alone by double clicking its name in the legend. Through a first look, the readings of AGOC-3A and Methylosmolene seem very volatile compared to the Chlorodinine and Appluimonia.

We think these are more suspicious than the other two because the ranges of values are way larger as shown by the summary statistics below. While this volatility could be explained by some other factors as the concentration of chemicals (measured in parts per million) can be different for different chemicals, we still think that something is happening. The quartiles, means, and standard deviation are small, but the range is comparatively very large.

# make subsetted datasets for summary stats
methylosmolene <- sensor_data %>% filter(chemical == "Methylosmolene")
agoc3a <- sensor_data %>% filter(chemical == "AGOC-3A")
chlorodinine <- sensor_data %>% filter(chemical == "Chlorodinine")
appluimonia <- sensor_data %>% filter(chemical == "Appluimonia")

# make dataframe of summary stats
summary_stats <- rbind(fav_stats(methylosmolene$reading), fav_stats(agoc3a$reading), fav_stats(chlorodinine$reading), fav_stats(appluimonia$reading))

# add chemical name manually
summary_stats <- summary_stats %>% mutate(chemical = c("Methylosmolene", "AGOC-3A", "Chlorodinine", "Appluimonia"))

# calculate range
summary_stats <- summary_stats %>% mutate(range = max - min)

# reorder columns
summary_stats <- summary_stats[, c(10, 11, 1, 2, 3, 4, 5, 6, 7, 8, 9)]

# fix row names
rownames(summary_stats)<-c("1","2","3","4")
# print stats
summary_stats %>%
  select(-c(n, missing)) %>%
  knitr::kable() %>%
  kableExtra::kable_minimal(full_width = FALSE)
chemical range min Q1 median Q3 max mean sd
Methylosmolene 100.77540 0.0010029 0.1946140 0.386876 0.735009 100.77640 0.7194503 2.5474691
AGOC-3A 101.10456 0.0010247 0.1941645 0.403403 0.785452 101.10558 0.8947713 3.1701920
Chlorodinine 15.72209 0.0010153 0.1918200 0.397809 0.754519 15.72311 0.6440396 0.8516301
Appluimonia 10.14664 0.0010538 0.1925800 0.394182 0.734280 10.14769 0.5992801 0.6657862

Perhaps this extra volatility/concentration explains the dumping of the chemical in river.

What patterns of chemical releases do you see?

Investigating further into these chemicals, they are most commonly detected on monitor 6, and then monitor on 3.

Methylosmolene

From the “Sensor Readings by Chemical Type” graph above, it looks like an average reading for Methylosmolene is about 4 parts per million. We filter for the dataset due to limited computational power: am filtering to check for monitors

d1 <- sensor_data %>% filter(chemical == "Methylosmolene" & reading >= 4)
ggplotly(ggplot(d1, aes(x = date_time, y = reading, color = monitor)) +
  geom_point() + labs(x = "Month", y = "Monitor Reading", 
                       title = "Sensor Readings by Monitor for Methylosmolene"))

Most of the higher readings are coming from monitor 6 in April and December!

AGOC-3A

Similar to Methylosmolene, the average reading also looks like about 4 parts per million.

d2 <- sensor_data %>% filter(chemical == "AGOC-3A" & reading >= 4)
ggplotly(ggplot(d2, aes(x = date_time, y = reading, color = monitor)) +
  geom_point() + labs(x = "Month", y = "Monitor Reading", 
                       title = "Sensor Readings by Monitor for AGOC-3A"))

Again, a lot of the higher readings are at monitor 6, especially in April!

Which factories are responsible for which chemical releases?

Through an investigation of the financial data of these factories reported in the semi-annual newsletters, we are suspicious that the Kasios Office Furniture and Radiance ColourTek are involved in some way.

# create data frame with {datapasta} addin
finance <- data.frame(
  stringsAsFactors = FALSE,
  Year = c(
    "2012-12-01",
    "2013-03-01", "2013-06-01", "2013-10-01", "2013-12-01",
    "2014-03-01", "2014-06-01", "2014-10-01", "2014-12-01",
    "2015-03-01", "2015-06-01", "2015-10-01", "2015-12-01",
    "2016-03-01", "2016-06-01", "2016-10-01"
  ),
  Indigo = c(
    4.32, 3.56, 2.95, 3.28,
    4.5, 4.16, 3.26, 3.87, 4.91, 3.83, 3.65, 4.24, 5.21,
    4.74, 4.21, 4.57
  ),
  Kasios = c(
    6.24, 6.37, 6.17, 5.49,
    5.23, 4.92, 4.18, 3.25, 2.65, 2.94, 3.48, 4.9, 5.61,
    6.84, 7.23, 7.85
  ),
  Radiance = c(
    7.18, 7.32, 7.09, 7.21,
    7.37, 7.63, 7.29, 8.02, 7.53, 4.18, 4.36, 4.22, 4.89,
    4.71, 5.03, 5.13
  ),
  Roadrunner = c(
    5.53, 5.81, 6.34, 6.73,
    7.14, 6.92, 6.84, 6.95, 6.69, 6.74, 6.53, 6.71, 6.82,
    6.59, 6.71, 6.68
  )
)
finance$Year <- parse_date_time(finance$Year, "Ymd")

# plot
Indigo <- ggplot(finance, aes(x = Year, y = Indigo)) +
  geom_line()
Kasios <- ggplot(finance, aes(x = Year, y = Kasios)) +
  geom_line()
Radiance <- ggplot(finance, aes(x = Year, y = Radiance)) +
  geom_line()
Roadrunner <- ggplot(finance, aes(x = Year, y = Roadrunner)) +
  geom_line()
Indigo + Kasios + Radiance + Roadrunner + plot_annotation(title = "Quaterly Earnings per Share ($) by Company")

Looking at the plot above, we see around late 2014 and early 2015, there was a steep decline in the earnings per share of both the above mentioned companies. Kasios quickly recovers, but Radiance does not.

We are suspicious that Kasios is the “bad player” and Radiance is being “framed” because Kasios’ quick fall and recovery looks too good to be true and the timing coincides with the fall of the Radiance company.

We are not suspicious of the Roadrunner Fitness Electronics and Indigo Sol Boards factories because there is either a stable increase or decrease in earnings per share around the time frame we are looking at (2014-2016).

The sensors 3 and 6 are also quite close in proximity to the Kasios and Radiance factories.

For the factories you identified, describe any observed patterns of operation revealed in the data.